Goto

Collaborating Authors

 voice input


GenAI Voice Mode in Programming Education

arXiv.org Artificial Intelligence

Real-time voice interfaces using multimodal Generative AI (GenAI) can potentially address the accessibility needs of novice programmers with disabilities (e.g., related to vision). Yet, little is known about how novices interact with GenAI tools and their feedback quality in the form of audio output. This paper analyzes audio dialogues from nine 9th-grade students using a voice-enabled tutor (powered by OpenAI's Realtime API) in an authentic classroom setting while learning Python. We examined the students' voice prompts and AI's responses (1210 messages) by using qualitative coding. We also gathered students' perceptions via the Partner Modeling Questionnaire. The GenAI Voice Tutor primarily offered feedback on mistakes and next steps, but its correctness was limited (71.4% correct out of 416 feedback outputs). Quality issues were observed, particularly when the AI attempted to utter programming code elements. Students used the GenAI voice tutor primarily for debugging. They perceived it as competent, only somewhat human-like, and flexible. The present study is the first to explore the interaction dynamics of real-time voice GenAI tutors and novice programmers, informing future educational tool design and potentially addressing accessibility needs of diverse learners.


Microsoft adds AI voice chat to Bing on desktop

Engadget

You can now talk to Bing on desktop, and it can even read its replies to you out loud. Microsoft has rolled out voice support for the search engine's chatbot on Edge for PCs, which is powered by OpenAI's GPT-4 technology. "We know many of you love using voice input for chat on mobile," the tech giant wrote in its latest Bing preview release notes. The feature first became available on Bing's AI chatbot for its mobile apps. Now it's also available on desktop -- you just need to tap on the mic icon in the Bing Chat box to talk to the AI-powered bot.


Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study

arXiv.org Artificial Intelligence

Detection of common events and scenes from audio is useful for extracting and understanding human contexts in daily life. Prior studies have shown that leveraging knowledge from a relevant domain is beneficial for a target acoustic event detection (AED) process. Inspired by the observation that many human-centered acoustic events in daily life involve voice elements, this paper investigates the potential of transferring high-level voice representations extracted from a public speaker dataset to enrich an AED pipeline. Towards this end, we develop a dual-branch neural network architecture for the joint learning of voice and acoustic features during an AED process and conduct thorough empirical studies to examine the performance on the public AudioSet [1] with different types of inputs. Our main observations are that: 1) Joint learning of audio and voice inputs improves the AED performance (mean average precision) for both a CNN baseline (0.292 vs 0.134 mAP) and a TALNet [2] baseline (0.361 vs 0.351 mAP); 2) Augmenting the extra voice features is critical to maximize the model performance with dual inputs.


Hacking Artificial Intelligence โ€“ Influencing and Cases of Manipulation

#artificialintelligence

Because the method has taken on such a central role, it is creating some major risks. This article discusses the extent to which AI can be hacked. The discussion presented here applies to both strong and weak AI applications in equal measure. In both cases, input is collected and processed before an appropriate response is produced. It doesn't matter whether the system is designed for classic image recognition, a voice assistant on a smartphone, or a fully automated combat robot.


Researchers Created AI That Hides Your Emotions From Other AI

#artificialintelligence

Humans can communicate a range of nonverbal emotions, from terrified shrieks to exasperated groans. Voice inflections and cues can communicate subtle feelings, from ecstasy to agony, arousal and disgust. Even when simply speaking, the human voice is stuffed with meaning, and a lot of potential value if you're a company collecting personal data. Now, researchers at the Imperial College London have used AI to mask the emotional cues in users' voices when they're speaking to internet-connected voice assistants. The idea is to put a "layer" between the user and the cloud their data is uploaded to by automatically converting emotional speech into "normal" speech.


Researchers Created AI That Hides Your Emotions From Other AI

#artificialintelligence

Humans can communicate a range of nonverbal emotions, from terrified shrieks to exasperated groans. Voice inflections and cues can communicate subtle feelings, from ecstasy to agony, arousal and disgust. Even when simply speaking, the human voice is stuffed with meaning, and a lot of potential value if you're a company collecting personal data. Now, researchers at the Imperial College London have used AI to mask the emotional cues in users' voices when they're speaking to internet-connected voice assistants. The idea is to put a "layer" between the user and the cloud their data is uploaded to by automatically converting emotional speech into "normal" speech.


Emotionless: Privacy-Preserving Speech Analysis for Voice Assistants

arXiv.org Machine Learning

Voice-enabled interactions provide more human-like experiences in many popular IoT systems. Cloud-based speech analysis services extract useful information from voice input using speech recognition techniques. The voice signal is a rich resource that discloses several possible states of a speaker, such as emotional state, confidence and stress levels, physical condition, age, gender, and personal traits. Service providers can build a very accurate profile of a user's demographic category, personal preferences, and may compromise privacy. To address this problem, a privacy-preserving intermediate layer between users and cloud services is proposed to sanitize the voice input. It aims to maintain utility while preserving user privacy. It achieves this by collecting real time speech data and analyzes the signal to ensure privacy protection prior to sharing of this data with services providers. Precisely, the sensitive representations are extracted from the raw signal by using transformation functions and then wrapped it via voice conversion technology. Experimental evaluation based on emotion recognition to assess the efficacy of the proposed method shows that identification of sensitive emotional state of the speaker is reduced by ~96 %.


Microsoft patent suggests you whisper to your voice assistants

Engadget

While voice assistants have grown in popularity over recent years, many people still hesitate to use them in public spaces, and that's a problem Microsoft is looking to tackle. In a patent filing, the company notes that for a number of reasons -- not wanting to disturb those nearby, not wanting to share private information around strangers -- people often avoid issuing voice commands when in public. "Although performance of voice input has been greatly improved, the voice input is still rarely used in public spaces, such as office or even homes," says the patent filing. "These are not technical issues but social issues. Hence there is no easy fix even if voice recognition system performance is greatly improved."


Amazon wants to turn Alexa into a makeshift doctor

Daily Mail - Science & tech

Alexa may soon be able to act as an in-house doctor for poorly or upset users. A patent that was filed by Amazon reveals that Alexa will automatically detect unusual changes in a person's voice and speaking patterns. The AI-powered smart speaker will also pick up on auditory clues like coughs and moans then offer suggestions to held aid a speedy recovery. These could include suggesting you eat a bowl of chicken soup as well as offering to deliver cough tablets, tissues and play you soothing music. Amazon has successfully obtained a patent which would allow Alexa to detect unusual changes in a person's voice caused by illness or crying.


Amazon Files for Patent to Detect User Illness and Emotional State by Analyzing Voice Data - Voicebot

#artificialintelligence

Amazon yesterday filed a patent with the U.S. Patent and Trademark Office related to detecting physical and emotional wellbeing of users based on interactions captured in voice data. The first example in the patent application depicts a user coughing while asking Alexa about being hungry. Alexa responds by suggesting a chicken soup recipe and when refused then offers to order cough drops with one-hour delivery. The voice recognition system is using sounds such as a cough or sniffle to determine if a user is unwell. However, the patent is not limited by these sounds and could be extended to different types of normal speech.